Before showing any maps and plots, our variables in focus here must be defined. The measurements of PM2.5 and Asthma to follow are defined in the CalEnviroScreen 4.0 report as:
PM2.5: Annual mean concentration of PM2.5 (weighted average of measured monitor concentrations and satellite observations, µg/m3), over three years (2015 to 2017).
Asthma: Spatially modeled, age-adjusted rate of ED visits for asthma per 10,000 (averaged over 2015-2017).
This map shows higher concentrations of PM2.5 in more low lying and urban areas. There also seems to be more PM2.5 in the East Bay than any other large section of the Bay.
Visual inspection of this map reveals a similar general trend as our PM2.5 map (higher prevalence in the East Bay). However, here we see much more contained ‘hot spots’, including places (cities/towns/neighborhoods) like East Oakland, West Oakland, Richmond, Valejo, and Antioch (among others).
Our regression line seems to capture the general positively correlated trend in our data points, however the shape of the plotted points does not seem linear. Asthma rates seem to shoot up more rapidly as the PM2.5 concentration goes from 8 to 9. This suggests perhaps a more exponential relationship. It should also be mentioned that there seem to be many outliers and quite widely spread data, particularly in the lower and higher ranges of PM2.5 concentration.
##
## Call:
## lm(formula = Asthma ~ PM2.5, data = bay_pm25_asthma_tract)
##
## Residuals:
## Min 1Q Median 3Q Max
## -54.47 -25.89 -9.61 12.94 182.95
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -116.278 13.040 -8.917 <2e-16 ***
## PM2.5 19.862 1.534 12.950 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 37.49 on 1578 degrees of freedom
## Multiple R-squared: 0.09606, Adjusted R-squared: 0.09549
## F-statistic: 167.7 on 1 and 1578 DF, p-value: < 2.2e-16
Comments: R^2 value quite low (0.09). Residuals are not centered on 0 (min ~(-50), max ~200), and they are not symmetrically distributed. A side point: the intercept is highly negative (impossible to have negative asthma rate). The slope of the regression line is not close to 0 (it’s 19 with a standard error of 1.5), so there’s no risk of the null hypothesis being true. The P-values are very low, which is a good sign. “An increase of 1 in PM2.5 is associated with an increase of 19.862 in Asthma”; “9% of the variation in Asthma is explained by the variation in PM2.5”.
We see here that the residuals are not centered around 0, and also are not symmetrical (there is a right tail).
This looks a little better. The curved (exponential shape) of the original plotted points has flattened out a little bit, but still only a little. Visually, the regression line seems to cut through the center of the points better (suggesting that our residuals will have a better centering at 0 and symmetry).
##
## Call:
## lm(formula = log(Asthma) ~ PM2.5, data = bay_pm25_asthma_tract)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.00402 -0.46479 0.03313 0.42298 1.75525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.69234 0.22840 3.031 0.00248 **
## PM2.5 0.35633 0.02686 13.264 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6566 on 1578 degrees of freedom
## Multiple R-squared: 0.1003, Adjusted R-squared: 0.09974
## F-statistic: 175.9 on 1 and 1578 DF, p-value: < 2.2e-16
After the log transform, the residuals are much more symmetrical and centered on 0. R^2 improved but is still pretty low (0.10).
We see clearly that the residuals now are much more evenly distributed around 0, and better centered. There is an interesting double hump which may suggest some notable trend or characteristic of the data.
The tract with the most negative residual (at about -2 after being log transformed) is Stanford, CA! A negative residual means that the actual value for asthma rate in Stanford was LESS than what the model predicted. This means that our model (even after being improved from the log transformation of our Asthma measurement) is significantly overestimating what the rate of asthma would be at Stanford, given the amount of PM2.5 in the air. One possible cause of this could be that Stanford’s population is rather temporary. So, either people come in for too short of a time to have the air quality effect their health (and cause asthma). Or, perhaps, the transient student population doesn’t stay around for long enough to report their health conditions in collected surveys. It should be noted that Stanford’s PM2.5 value is ~8, a mid range value just under the mean and median for the Bay Area.